Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding
نویسندگان
چکیده
Before any processing of the textual content of a document image can be performed the text must be separated from the background of the image. Several thresholding algorithms have previously been proposed and are widely used in document processing. None have been shown effective at thresholding difficult documents where the background and foreground are non-uniform. In this paper we investigate the use of three global thresholding algorithms (Otsu’s, Kapur’s entropy and Solihin’s quadratic integral ratio (QIR)) as the first stage in a multi-stage thresholding algorithm for use in degraded document images. It is concluded that Otsu’s and Kapur’s algorithms do not work well for difficult documents as they tend to over-threshold the image, thus losing much of the useful information. The QIR algorithm is more accurate in separating the foreground and background in these images, leaving a range of undecided, fuzzy, pixels for later processing in a subsequent stage.
منابع مشابه
Effective Thresholding of Ancient Degraded Manuscript Folio Images
Thresholding is an essential procedure used in image segmentation and binarization applications. In this paper, segmentation methods applied on document images for separating the text from background presents pure binarization and filtering combined with image processing algorithms. This paper describes a contrast based thresholding method for old degraded manuscript images. It is an approach f...
متن کاملText/ Background separation in the degraded document images by combining several thresholding techniques
Extract the text from the background is an important step in all process of document analysis and recognition. If this extraction is easy for document images of good quality by applying simple techniques of global thresholding, the images of degraded documents require a more accurate analysis and we have recourse in this case to local methods. Indeed, these latter are generally more efficient a...
متن کاملDecompose algorithm for thresholding degraded historical document images
Numerous techniques have previously been proposed for single-stage thresholding of document images to separate the written or printed information from the background. Although these global or local thresholding techniques have proven effective on particular subclasses of documents, none is able to produce consistently good results on the wide range of document image qualities that exist in gene...
متن کاملDegraded Document Image Binarization Techniques
Document Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR) and Document Image Retrieval (DIR). This research area has been studied...
متن کاملA Novel Degraded Document Image Binarazation by using Local Thresholding Segmentation
The proposed binarization is a scheme of parting a image pixel values into two classes black as foreground and white pixels as background then the thresholding is found for well known scheme for document image binarization. In this proposed work for the decomposition of both global and local thresholding this basic thresholding value we can use further. Here the global thresholding scheme is ef...
متن کامل